Add debug weight sync feature by zianglih · Pull Request #525 · radixark/miles

zianglih · 2026-01-28T20:55:09Z

This PR conducts per-tensor bit-exact check for first weight sync.

Example:

python scripts/run_qwen3_30b_a3b.py --no-enable-eval --hardware GB200 --num-gpus-per-node 8 --rollout-fp8 --extra-args "--debug-first-weight-sync /root/models/debug-weight-sync "

gemini-code-assist · 2026-01-28T20:55:39Z

Summary of Changes

Hello @zianglih, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the debugging capabilities for weight synchronization processes within the miles framework by introducing a dedicated feature to inspect the initial weight transfer from Megatron to SGLang. Concurrently, it integrates support for MXFP8 quantization, optimizing performance and memory usage, particularly for models running on Blackwell hardware. These changes aim to improve the reliability of weight handling and expand hardware compatibility for advanced quantization techniques.

Highlights

Debug Weight Sync Feature: Introduced a new command-line argument --debug-first-weight-sync that allows users to save the first Hugging Face checkpoint synced from Megatron to SGLang, compare it against a source checkpoint, and report any bitwise mismatches. This feature terminates the run after the comparison.
MXFP8 Quantization Support: Added comprehensive support for MXFP8 quantization, including a new quantizer_mxfp8.py module, integration into the quantize_params function, and exposure of mxfp8_group_quantize from SGLang utilities.
Blackwell Hardware Optimization: Updated Qwen scripts (run_qwen3_30b_a3b.py, run_qwen3_4b.py) to leverage MXFP8 quantization specifically for Blackwell (GB200/GB300) hardware, including automatic conversion of Hugging Face models to MXFP8 format and adjusted SGLang backend configurations for FP8 rollout.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a valuable debug feature for weight synchronization and adds support for mxfp8 quantization. However, critical security vulnerabilities were identified, primarily related to insecure deserialization and command injection. Specifically, the use of torch.load() without weights_only=True in the new debugging utility could lead to arbitrary code execution from malicious checkpoints. Furthermore, several training scripts are vulnerable to command injection due to direct interpolation of user-supplied arguments into shell commands. Beyond these security concerns, suggestions were made to enhance code quality by refining exception handling, reducing code duplication, and cleaning up module exports. Addressing these security issues is paramount.

gemini-code-assist · 2026-01-28T20:58:07Z

miles/utils/hf_checkpoint_debug.py

+                )
+            return safe_open(path, framework="pt", device="cpu")
+        if self.fmt == "bin":
+            obj = torch.load(path, map_location="cpu")


The use of torch.load() without weights_only=True is insecure as it relies on the pickle module, which can execute arbitrary code during deserialization. An attacker could provide a malicious checkpoint file that, when loaded for debugging or comparison, executes arbitrary commands on the system. It is highly recommended to use weights_only=True to restrict deserialization to safe types.

Suggested change

obj = torch.load(path, map_location="cpu")

obj = torch.load(path, map_location="cpu", weights_only=True)

gemini-code-assist · 2026-01-28T20:58:07Z

scripts/run_qwen3_30b_a3b.py

        U.exec_command(
            f"huggingface-cli download Qwen/{args.model_name}-FP8 --local-dir /root/models/{args.model_name}-FP8"
        )


The model_name argument is directly interpolated into a shell command string without sanitization. This allows for command injection if an attacker can control the model_name parameter. For example, a model_name like ; touch /tmp/pwned would result in the execution of the injected command. Use shlex.quote() to sanitize any variables used in shell commands.

gemini-code-assist · 2026-01-28T20:58:07Z

scripts/run_qwen3_30b_a3b.py

+            U.exec_command(
+                f"python tools/convert_hf_to_mxfp8.py --model-dir /root/models/{args.model_name} --save-dir {mxfp8_path}"
+            )


Similar to the previous finding, args.model_name and mxfp8_path (which is derived from args.model_name) are used in a shell command without sanitization, leading to a potential command injection vulnerability.

gemini-code-assist · 2026-01-28T20:58:07Z

scripts/run_qwen3_4b.py


-    if args.rollout_fp8:
+    if args.rollout_fp8 and not use_blackwell_fp8:
        U.exec_command(f"hf download Qwen/{args.model_name}-FP8 --local-dir /root/models/{args.model_name}-FP8")


The model_name argument is directly interpolated into a shell command string without sanitization. This allows for command injection if an attacker can control the model_name parameter. Use shlex.quote() to sanitize any variables used in shell commands.

gemini-code-assist · 2026-01-28T20:58:07Z

scripts/run_qwen3_4b.py

+            U.exec_command(
+                f"python tools/convert_hf_to_mxfp8.py --model-dir /root/models/{args.model_name} --save-dir {mxfp8_path}"
+            )


The model_name and mxfp8_path variables are used in a shell command without sanitization, which can lead to command injection if the input is manipulated.

gemini-code-assist · 2026-01-28T20:58:08Z

miles/backends/megatron_utils/megatron_to_hf/processors/__init__.py

-__all__ = ["remove_padding", "quantize_param", "quantize_params_fp8", "quantize_params_compressed_tensors"]
+__all__ = [
+    "remove_padding",
+    "quantize_param",


The __all__ list includes quantize_param, but this function is not defined or imported in this module. This appears to be a pre-existing issue, but since this block is being modified, it's a good opportunity to correct it. Removing this line will prevent potential NameError exceptions and improve code clarity.

gemini-code-assist · 2026-01-28T20:58:08Z

miles/backends/megatron_utils/megatron_to_hf/processors/quantizer_mxfp8.py

+    # experts
+    expert_pattern = r"mlp.experts\.(.+)\.weight(\d+)"
+    match = re.match(expert_pattern, rest)
+    if match:
+        rest, expert_idx = match.groups()
+        if rest in [
+            "linear_fc1",
+            "linear_fc2",
+        ]:
+            quantize_named_params = []
+            for converted_name, param in converted_named_params:
+                # skip bf16 weight_scale and input_scale
+                # TODO: find a clearer way.
+                if converted_name.endswith("_scale"):
+                    continue
+                quantize_named_params.extend(_quantize_param(converted_name, param))
+
+            return quantize_named_params
+
+    # shared expert
+    shared_expert_pattern = r"mlp.shared_experts\.(.+)"
+    match = re.match(shared_expert_pattern, rest)
+    if match:
+        rest = match.groups()[0]
+        if rest in [
+            "linear_fc1.weight",
+            "linear_fc2.weight",
+        ]:
+            quantize_named_params = []
+            for converted_name, param in converted_named_params:
+                quantize_named_params.extend(_quantize_param(converted_name, param))
+
+            return quantize_named_params
+
+    if rest in [
+        "self_attention.linear_proj.weight",
+        "self_attention.linear_qkv.weight",
+        "mlp.linear_fc1.weight",
+        "mlp.linear_fc2.weight",
+        # mla
+        "self_attention.linear_q_proj.weight",
+        "self_attention.linear_q_down_proj.weight",
+        "self_attention.linear_q_up_proj.weight",
+        "self_attention.linear_kv_down_proj.weight",
+        "self_attention.linear_kv_up_proj.weight",
+    ]:
+        quantize_named_params = []
+        for converted_name, param in converted_named_params:
+            quantize_named_params.extend(_quantize_param(converted_name, param))
+
+        return quantize_named_params


There is significant code duplication in how quantization is applied for different layer types (experts, shared experts, and other linear layers). The logic to iterate over converted_named_params and call _quantize_param is repeated.

This could be refactored into a helper function to improve maintainability and readability. For example:

def _apply_quantization(converted_named_params, skip_scales=False): quantized_params = [] for name, param in converted_named_params: if skip_scales and name.endswith("_scale"): continue quantized_params.extend(_quantize_param(name, param)) return quantized_params # ... inside quantize_params_mxfp8, you can then determine if quantization is needed # and call the helper, e.g.: # if should_quantize: # return _apply_quantization(converted_named_params, skip_scales=is_expert_layer)

gemini-code-assist · 2026-01-28T20:58:08Z

miles/utils/hf_checkpoint_debug.py

+except Exception:  # pragma: no cover - optional dependency
+    safe_open = None


Catching a broad Exception for an optional import can hide other unexpected errors. It's better to catch the specific ImportError that occurs when the optional dependency is not installed.

Suggested change

except Exception: # pragma: no cover - optional dependency

safe_open = None

except ImportError: # pragma: no cover - optional dependency

safe_open = None

gemini-code-assist · 2026-01-28T20:58:08Z

miles/utils/hf_checkpoint_debug.py

+    except Exception as exc:  # pragma: no cover - optional dependency
+        logger.warning(
+            "Cannot resolve HF repo id %s (huggingface_hub unavailable): %s",
+            path_or_repo,
+            exc,
+        )


Similar to the previous comment, catching a broad Exception for an optional import can mask other issues. It's more precise to catch ImportError here as well.

Suggested change

except Exception as exc: # pragma: no cover - optional dependency

logger.warning(

"Cannot resolve HF repo id %s (huggingface_hub unavailable): %s",

path_or_repo,

exc,

)

except ImportError as exc: # pragma: no cover - optional dependency

logger.warning(

"Cannot resolve HF repo id %s (huggingface_hub unavailable): %s",

path_or_repo,

exc,

)

guapisolo · 2026-02-20T22:15:07Z

Thanks for contribution. I am looking into this PR.

zianglih requested review from fzyzcjy and yueming-yuan as code owners January 28, 2026 20:55

gemini-code-assist bot reviewed Jan 28, 2026

View reviewed changes

fzyzcjy requested review from guapisolo and maocheng23 as code owners February 7, 2026 05:02

zianglih marked this pull request as draft February 13, 2026 02:01

ziang-and force-pushed the debug-weight-sync branch from a18b407 to fe5fc1a Compare February 20, 2026 21:53

zianglih added 2 commits February 20, 2026 13:56

Initial attempt for weight sync debug.

de387e1

Terminate the run early if debug weight sync.

2c276a3

ziang-and force-pushed the debug-weight-sync branch from fe5fc1a to 2c276a3 Compare February 20, 2026 21:56

zianglih marked this pull request as ready for review February 20, 2026 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add debug weight sync feature#525

Add debug weight sync feature#525
zianglih wants to merge 2 commits intoradixark:mainfrom
zianglih:debug-weight-sync

zianglih commented Jan 28, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 28, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

guapisolo commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	obj = torch.load(path, map_location="cpu")
	obj = torch.load(path, map_location="cpu", weights_only=True)

		except Exception: # pragma: no cover - optional dependency
		safe_open = None

Conversation

zianglih commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Jan 28, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

guapisolo commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zianglih commented Jan 28, 2026 •

edited

Loading